Modeling Complement Types in Phrase-Based SMT
نویسندگان
چکیده
We explore two approaches to model complement types (NPs and PPs) in an Englishto-German SMT system: A simple abstract representation inserts pseudo-prepositions that mark the beginning of noun phrases, to improve the symmetry of source and target complement types, and to provide a flat structural information on phrase boundaries. An extension of this representation generates context-aware synthetic phrasetable entries conditioned on the source side, to model complement types in terms of grammatical case and preposition choice. Both the simple preposition-informed system and the context-aware system significantly improve over the baseline; and the context-aware system is slightly better than the system without context information.
منابع مشابه
Integrating Translation Memory into Phrase-Based Machine Translation during Decoding
Since statistical machine translation (SMT) and translation memory (TM) complement each other in matched and unmatched regions, integrated models are proposed in this paper to incorporate TM information into phrase-based SMT. Unlike previous multi-stage pipeline approaches, which directly merge TM result into the final output, the proposed models refer to the corresponding TM information associ...
متن کاملCollective Corpus Weighting and Phrase Scoring for SMT Using Graph-Based Random Walk
Data quality is one of the key factors in Statistical Machine Translation (SMT). Previous research addressed the data quality problem in SMT by corpus weighting or phrase scoring, but these two types of methods were often investigated independently. To leverage the dependencies between them, we propose an intuitive approach to improve translation modeling by collective corpus weighting and phra...
متن کاملA CCG-based Quality Estimation Metric for Statistical Machine Translation
We describe a metric for estimating the quality of Statistical Machine Translation (SMT) output based on syntactic features extracted using Combinatory Categorial Grammar (CCG). CCG has been demonstrated to be better suited to deal with SMT texts than context free phrase structure grammar formalisms. We use CCG features to estimate the grammaticality of the translations by dividing them into ma...
متن کاملDependency Relations as Source Context in Phrase-Based SMT
The Phrase-Based Statistical Machine Translation (PB-SMT) model has recently begun to include source context modeling, under the assumption that the proper lexical choice of an ambiguous word can be determined from the context in which it appears. Various types of lexical and syntactic features such as words, parts-of-speech, and supertags have been explored as effective source context in SMT. ...
متن کاملChained System: A Linear Combination of Different Types of Statistical Machine Translation Systems
The paper explores a way to learn post-editing fixes of raw MT outputs automatically by combining two different types of statistical machine translation (SMT) systems in a linear fashion. Our proposed system (which we call a chained system) consists of two SMT systems: (i) a syntax-based SMT system and (ii) a phrase-based SMT system (Koehn, 2004). We first translate source sentences of the bite...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2016